A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments
نویسندگان
چکیده
Bots are frequently used in Github repositories to automate repetitive activities that part of the distributed software development process. They communicate with human actors through comments. While detecting their presence is important for many reasons, no large and representative ground-truth dataset available, nor classification models detect validate bots on basis such a dataset. This paper proposes dataset, based manual analysis high interrater agreement, pull request issue comments 5,000 distinct accounts which 527 have been identified as bots. Using this we propose an automated model bots, taking main features number empty non-empty each account, comment patterns, inequality between within patterns. We obtained very weighted average precision, recall F1-score 0.98 test set containing 40% data. integrated into open source command-line tool allow practitioners given repository actually correspond
منابع مشابه
the innovation of a statistical model to estimate dependable rainfall (dr) and develop it for determination and classification of drought and wet years of iran
آب حاصل از بارش منبع تأمین نیازهای بی شمار جانداران به ویژه انسان است و هرگونه کاهش در کم و کیف آن مستقیماً حیات موجودات زنده را تحت تأثیر منفی قرار می دهد. نوسان سال به سال بارش از ویژگی های اساسی و بسیار مهم بارش های سالانه ایران محسوب می شود که آثار زیان بار آن در تمام عرصه های اقتصادی، اجتماعی و حتی سیاسی- امنیتی به نحوی منعکس می شود. چون میزان آب ناشی از بارش یکی از مولفه های اصلی برنامه ...
15 صفحه اولpassivity in waiting for godot and endgame: a psychoanalytic reading
this study intends to investigate samuel beckett’s waiting for godot and endgame under the lacanian psychoanalysis. it begins by explaining the most important concepts of lacanian psychoanalysis. the beckettian characters are studied regarding their state of unconscious, and not the state of consciousness as is common in most beckett studies. according to lacan, language plays the sole role in ...
Collecting a Ground Truth Dataset for OpenStreetMap
The quality of OpenStreetMap (OSM) and volunteered geographic information (VGI) in general has already been discussed extensively in the literature. Researchers have looked at this issue from different angles such as credibility [2], trust [1], provenance [12, 9], precision [4], and communities [5]. Comparative studies often use commercial datasets or datasets from a national mapping agencies f...
متن کاملA Synchronization Ground Truth for the Jiku Mobile Video Dataset
This paper introduces and describes a manually generated synchronization ground truth, accurate to the level of the audio sample, for the Jiku Mobile Video Dataset, a dataset containing hundreds of videos recorded by mobile users at different events with drama, dancing and singing performances. It aims at encouraging researchers to evaluate the performance of their audio, video, or multimodal s...
متن کاملa framework for identifying and prioritizing factors affecting customers’ online shopping behavior in iran
the purpose of this study is identifying effective factors which make customers shop online in iran and investigating the importance of discovered factors in online customers’ decision. in the identifying phase, to discover the factors affecting online shopping behavior of customers in iran, the derived reference model summarizing antecedents of online shopping proposed by change et al. was us...
15 صفحه اولذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Systems and Software
سال: 2021
ISSN: ['0164-1212', '1873-1228']
DOI: https://doi.org/10.1016/j.jss.2021.110911